ITHIM

Rob Johnson, Ali Abbas

21 October, 2021

Introduction

Summary of literature

Summary of aims

Overview

The remainder of the document presents a current working version of the ITHIM, and is a comprehensive record of all the calculations of which it is composed. The equations for the ITHIM are described in Section 2 (tabulated in Section 7). The method is concretised through the example application to Accra in Section 3 and particular columns in Tables 1 and 3, with results presented in Section 19.

The ITHIM as described here is implemented in the ITHIM-R R package (https://github.com/ITHIM/ITHIM-R). Other cities are also included in that package (São Paulo, Delhi, Bangalore), and reference will be made to those where relevant.

Method: ITHIM

In the ITHIM, we combine many information inputs and compute the health burden through three pathways (depicted in Figure 1). Each pathway has its own set of definitions and outputs, and we refer to them as “modules.”

![ITHIM workflow. Inputs to the model have a bold, black outline. The graph depicts parent–child relationships, where a child (at the head of an arrow) depends upon all its parents (at the source(s) of the arrow(s)). To aid visualisation, an arrow connected to the outside of a grey box indicates that the arrow connects to each item within the box. *AT: active travel. MET: metabolic equivalent task. pp: per person. AP: air pollution. PA: physical activity. DR: dose–response relationship. GBD: global burden of disease. BOD:

To describe the model, we use lower-case letters to denote indices, or dimensions, of objects. They take one of a set of possible values, detailed in Table 1. The set of input items are denoted by capital letters, and variable parameters are denoted by Greek letters. These are presented in Table 3. The equations of the ITHIM are presented below, and summarised in Table 4. These use letter modifiers to show how the inputs are modified in the course of output generation.

We calculate the ITHIM by combining the modules’ outputs in terms of the health burden (\(\hat{U}_{a,g,h,o,s}\)). Other outputs of the model include: MMETs per person (\(M_{i,s}\)), PM2.5 in each scenario (\(\bar{P}_s\)), PM2.5 per person (\(\check{W}_{i,s}\)), and injuries burden (\(\check{I}_{a,g,m,o,s}\)).

To assess uncertainty, we sample from the ITHIM output by sampling uncertain parameters multiple times and calculating the ITHIM. We specify parametric distributions or sampling strategies for all the uncertain parameters. Evaluating the ITHIM with uncertain parameters allows assessment of their impact on the outcome (AKA sensitivity analysis). We use EVPPI to calculate the expected reduction in uncertainty in the outcome were we to learn a parameter perfectly. This means we can implement models that are basic in their parametrisation, and learn at the end for which parameters it would be worthwhile spending dedicated time learning better.

Indices used in the ITHIM variables. The “Label” is the subscript used to refer to the index in this document. The “Name” is descriptive. The “Levels” are the levels used in the Accra model.
Label Name Levels
\(a\) Age group 15–49
50–69
\(g\) Gender Male
Female
\(h\) Disease
\(i\) Individual index
\(j\) Trip index
\(m\) Transport mode walk
cycling (cyc.)
bus
car
motorbike (mot.)
goods vehicles (GV)
subway (sub.)
\(o\) Outcome Death
YLL
\(s\) Scenario
\(w\) AP RR hyperparameter \(1,2,3,4\)
\(x\) PA RR hyperparameter \(1,2,3\)
\(z\) PA dose \(0,...,300\)
Common abbreviations.
Abbreviation Meaning
AP Air pollution
CDF Cumulative distribution function
COPD Chronic obstructive pulmonary disease
DR Dose response
EVPPI Expected value of perfect partial information
GAM Generalised additive model
GBD Global burden of disease
IHD Ischemic heart disease
ITHIM Integrated transport and health impact model
LRI Lower respiratory infection
MMET Marginal metabolic equivalent task
PA Physical activity
PAF Population attributable fraction
PIF Potential impact fraction
PM2.5 Particulate matter (diameter \(< 2.5 \mu\)m)
PT Public transport
RR Relative risk
VOI Value of information
YLL Years of life lost
All values input to the ITHIM model.
\[\[inputs\]]{label= “inputs”} ITHIM-R reference
Name Description Value in Accra model Label Model/ Setting
Global values
\(U_ { a,g,h,o}\) Background burden of disease Constant GBD_DATA Setting
\(N_{a,g}\) Population by age and gender in GBD_DATA Constant GBD_DATA Setting
\(\bar { N}_{a,g}\) Population by age and gender Constant DEMOGRAPHIC Setting
\(\rho_h\) Chronic disease scalar 1 or Lnorm(0,0.18) CHRONIC_DISEASE_SCALAR Setting
Travel values
\(Z _ {m=walk}\) Walk speed 4.8 MODE_SPEEDS Setting
\(Z _ {m=cyc.}\) Cycle speed 14.5 MODE_SPEEDS Setting
\(Z _ {m=bus}\) Bus speed 15 MODE_SPEEDS Setting
\(Z _ {m=car}\) Car speed 21 MODE_SPEEDS Setting
\(Z _ {m=mot.}\) Motorbike speed 25 MODE_SPEEDS Setting
$\epsilon $ Walk time to bus 5 or Lnorm(log (5),0.18) BUS_WALK_TIME Setting
\(\mu_{m=HGV}\) Truck distance relative to car 0.21 or Beta(3,10) TRUCK_TO_ CAR_RATIO Setting
\(\mu_{m=bus}\) Bus driver distance relative to bus passenger distance 1/45 or Bet a(20,600) | BUS_TO_PASSENGER_RATIO Setting
\(\mu_ {m=mot.}\) Motorcycle fleet distance relative to total motorcycle distance 37/68 or Beta(37/3+ 1,31/3+1) | FLEET_TO_MOTORCYCLE_RATIO Setting
\(T_j\) Trip-level data Constant SYNTHETIC_TRIPS Setting
Injury model values
\(I _ {a,g,m_{\ text{vi c} },m_{\text{str}}}\) Fatalities table Constant INJURY_TABLE Setting
\(\sigma\) Injury reporting rate 1 or Beta(8,3) INJURY_REPORTING_RATE Setting
\(\omega\) Sum of distance exponents 1.9 or Lnorm(l og(1.9),l og(1.03)) SIN_EXPONENT_SUM Model
\(\psi\) Casualty fraction of \(\omega\) 0.5 or Beta(20,20) | CASUALTY_EXPONENT_FRACTION Model
  • Pollution model

    values*

\(\eta\) Background PM2.5 concentration 50 or Lnorm(log( 50),0.18) | PM_CONC_BASE Setting
\(\zeta\) Fraction of measured PM2.5 concentration due to road transport 0.225 or Beta(5,20) | PM_TRANS_SHARE Setting
\(\ g amma_{m}\) Vehicle emission inventory Constant or Dirichlet EMISSION_INVENTORY Setting
\(P\) Vehicle emission confidence 1 EMISSION_INVENTORY_CONFIDENCE Setting
  • Pollution & health model

    values*

\(C_1\) Base-level inhalation rate 1 BASE_LEVEL_INHALATION_RATE Model
\(C_2\) Exposure ratio window closed 0.5 CLOSED_WINDOW_PM_ RATIO Model
\(C_3\) Proportion of travel with closed windows 0.5 C LOSED_WIN DOW_RATIO Setting
\(C_4\) Parameter for on-road PM2.5 level 3.216 ROAD_ RATIO_MAX Model
\(C_5\) Parameter for on-road PM2.5 level 0.379 ROAD_RA TIO_SLOPE Model
\(C_6\) Exposure ratio for underground 0.8 SUBWAY _PM_RATIO Setting
$ H _{a,h,w}$ AP dose- -res ponse–curve parameters Constant DR_AP Model
\(\ xi_{h,w}\) Quantiles for AP RR functions 0.5 or Unif(0,1) AP_DOSE_R ESPONSE_ QUANTILE Model
Physical activity model values
\(\chi\) Day-to-week travel scalar 7 or 7 \(\times\)Beta(20,3) DAY_TO_WEEK_TRAVEL_SCALAR Model
\(Y_i\) Individual n on-transport MMETs Constant PA_SET$ work_ltpa _marg_met Setting
\(\theta\) Background PA scalar 1 or Lnorm(0,0.18) BACKGROUND_ PA_SCALAR Setting
\(\ tilde{Y}\) Confidence in PA survey 1 BACKGROUND_PA_C ONFIDENCE Setting
\(\tilde {\theta}\) Background PA zeros quantile 0.5 or Unif(0,1) BACKGROUND _PA_ZEROS Setting
\(\lambda_{m=\text{cyc.}}\) Cycling MET surplus 4.63 or Lnorm(log (4.63),1) MMET_CYCLING Model
\(\lambda_{m=\text{walk}}\) Walking MET surplus 2.53 or Lnorm(log (2.53),1) MMET_WALKING Model
\(\lambda_{m\not\in\text{wal k,cyc.}}\) In-vehicle MET surplus 0 Model
Physical activity & health model values
$\G_{h,x,z}$ Look-up table for relative risk truncated normal distribution Constant Model
$_{h}$ Quantiles for PA RR functions 0.5 or Unif(0,1) PA_DOSE_RESPONSE_QUANTILE Model

Data module

Input data for the ITHIM are listed as Roman capital letters in Table 3. They include (vehicle) mode speeds, injury/fatality records, disease dose–response relationships, measures of air pollution, surveys of travel and physical activity, population numbers by demographic group, MMETs associated with active travel, and current burden of disease (see Figure 1).

Synthetic

We create a “synthetic population” by matching individuals surveyed for travel to the individuals surveyed for physical activity, based on age and gender. Then each person in the synthetic population has a set of trips taken and an amount of non-travel physical activity. This population is assumed to be representative of the population of the city or setting under study.

Alternatively/additionally

we could create a synthetic population as described in Appendix 8.3, which allows uncertainty in the propensity to travel. This method operates with person-level, rather than trip-level, information.

Scenarios

From the synthetic population we create “scenarios” of different pictures of travel by changing certain trips (e.g. swapping the mode of travel for particular trips and/or particular people). For the purposes of illustration, we choose a simple scenario for Accra in which every person gains one one-kilometre walk.

Distances

For the ITHIM, we process the travel data into distance data of various formats. First, we augment PT trips with a walk component: we take the set \(\{T_j:m(j)=\text{bus}\}\), which has \(J\) entries. We add \(J\) journeys with mode “walk” and duration \(\epsilon\) to the set \(T\). (This step would not be necessary for a travel survey which already includes all stages for each trip.)

Then we summarise the total distances and durations. The distance set is labelled \(\hat{T}_{m,s}\), representing the total distance travelled per mode per scenario, and the duration set is constructed analogously. We make an assumption that the distance for bus drivers scales linearly with bus passengers, where the distance in the baseline is defined based on the bus driver distance relative to car.

Distance and duration summaries include all person travel in the synthetic population and, in addition, some trips that are attributed to no-one in the synthetic population. These trips are used to represent “fleet” travel, which is uncaptured in travel surveys, and unchanged in scenarios. It’s important to include these trips because for modes such as motorcycle, where a considerable component of travel comes from fleet, attributing all the harms associated with the mode to only those surveyed dramatically overestimates the contribution of the individuals. In scenarios, then, we would be predicting emissions and injury rates that are far too high per person.

AP module

Background PM2.5

We use the total distance by mode (\(\hat{T}_{m,s}\)) to calculate the total PM2.5 in the scenario (\(\bar{P}_s\)). First we calculate the emission inventory in terms of fractions. If the confidence \(P\) is 1 then we use \(\dot{P}_{m}=\gamma_m\) directly. If \(P<1\), we sample the emission inventory fractions as follows: \[\tilde{\gamma}_m \sim \text{Dir}(f(P,\gamma_m)),\] \[\dot{P}_m=\tilde{\gamma}_m\] where we choose some mapping function such as \(f(P,\gamma_m)=\frac{\gamma_m}{\sum_m\gamma_m}10^{5P}\) (Figure 5).

Then we multiply vehicle distance by vehicle emission factor (where the emission factor is the emission inventory (fraction) divided by distance at baseline). \[\tilde{P}_{m,s}=\dot{P}_m\frac{\hat{T}_{m,s}}{\hat{T}_{m,s=\text{baseline}}}.\]

For the baseline, \(\tilde{P}_{m,s}\) is equal to the emission inventory fraction. For each scenario, it is scaled. We take the sum over modes to get a scalar for emissions in scenarios. \[\hat{P}_{s}=\sum_{m} \tilde{P}_{m,s}.\] Then the background PM2.5 in each scenario is the sum of the transport component and the non-transport component: \[\bar{P}_s=\eta(\zeta\hat{P}_s + 1-\zeta).\]

PM2.5 per person

Dose–response relationships are based on background concentrations in cities, and not on personal exposures. In our person-level model, we expect that individuals will have different exposures based on their different travel behaviours. While we can neither estimate individual exposures nor map them via any dose–response relationship to a relative risk, we aim to capture interpersonal differences via assumptions about the mechanistic relationships between behaviours and exposures, and anchor those to the background air pollution in their locality, for which dose–response estimates of relative risk exist.

Therefore, when we write “individual exposure,” it should not be read as an aetiologically relevant exposure, but rather it should be read as a unit- and scale-agnostic metric that maps an individual to a particular dose with reference to city-level measurements of air pollution. New research is required to elicit dose–response relationships between individual health exposures and health outcomes \[citation from JPO?\]. Our approximations act as assumptions for now and placeholders until more detailed data are available. \[Perhaps we should use words other than "individual exposure". How about "individual PM2.5 reference values"?\]

Individual exposures to PM2.5 are calculated using the background PM2.5 (\(\bar{P}_s\)) and the trip sets. There are three major components to daily exposure: one, a person’s total inhalation off road; two, a person’s inhalation on road in a vehicle, and, three, a person’s inhalation on road while undertaking active transport. Each category has an amount of background PM2.5 and a ventilation rate which together inform overall exposure.

The ratio of exposure off road to that on road is a function of total PM2.5, defined as \[K_s=C_4-C_5\log(\bar{P_s})\] as in Goel et al. (2015). This defines the exposure of a person in an open vehicle (i.e. pedestrian, cyclist or motorist): \[\tilde{K}_{m\in\{\text{walk,cyc.,mot.}\},s}=K_s\] and it is used to calculate in-vehicle exposure, assuming an exposure of \(C_2\) with the window closed, and a proportion \(C_3\) of vehicles having closed windows: \[\tilde{K}_{m\not\in\{\text{walk,cyc.,mot.,sub.}\},s}=C_2C_3+K_s(1-C_3).\] The exposure in a subway is constant and not dependent on the road ratio \(K_s\): \[\tilde{K}_{m=\text{sub.},s}=C_6.\]

Ventilation rates are calculated for each mode, assuming a base-level inhalation rate of \(C_1\): \[V_m=C_1+\frac{1}{2}\lambda_m,\] where \(\lambda_m\) is the MMETs for mode \(m\).1 Then the air inhaled during travel per person is \[\tilde{V}_{i,m,s} = \smashoperator{\sum_{\substack{j:i(j)=i,\\m(j)=m,\\s(j)=s}}}T_j\cdot V_{m(j)}/Z_{m(j)}\] and the rate of inhalation during travel is \[\bar{V}_{i,s} = \sum_{m}\tilde{V}_{i,m,s}\cdot \tilde{K}_{m,s}.\] The air inhaled when not travelling is \[\hat{V}_{i,s} = C_1\left(24-\smashoperator{\sum_{\substack{j:i(j)=i,\\s(j)=s}}}T_j/Z_{m(j)}\right).\] Together, the PM2.5 exposure is calculated as the total PM2.5 inhaled per hour as follows: \[\check{W}_{i,s} = \frac{\bar{P}_s(\bar{V}_{i,s}+\hat{V}_{i,s})}{24}.\]

AP–disease dose–response relative risk

We use each person’s exposure to PM2.5 (\(\check{W}_{i,s}\)) to calculate their relative risk of five diseases (IHD, lung cancer, COPD, stroke, LRI), using curves parametrised by four disease-specific variables (Burnett et al. 2014). Of the five diseases, two (IHD and stroke) have parameters specific to age groups starting at age 25.2 The other three (lung cancer, LRI and COPD) have one set of parameters for all ages.

The curves are in the form of samples of the set of four parameters. We model the densities of these samples (using a quantile for parameter 3, kernel density estimation for parameter 2, and GAMs for parameters 1 and 4)3 in order to draw either the median or random samples via their quantiles (\(\xi_{h,w}\)) as follows:

\[\begin{aligned} \tilde{H}_{a,h,w=3}=&\text{CDF}_{H_{a,h,w=3}}^{-1}(\xi_{h,w=3})\\ \tilde{H}_{a,h,w=2}=&\text{CDF}_{H_{a,h,w=2}|\tilde{H}_{a,h,w=3}}^{-1}(\xi_{h,w=2})\\ \tilde{H}_{a,h,w=1}=&\text{CDF}_{H_{a,h,w=1}|\tilde{H}_{a,h,w=2},\tilde{H}_{a,h,w=3}}^{-1}(\xi_{h,w=1})\\ \tilde{H}_{a,h,w=4}=&\text{CDF}_{H_{a,h,w=4}|\tilde{H}_{a,h,w=3},\tilde{H}_{a,h,w=2},\tilde{H}_{a,h,w=1}}^{-1}(\xi_{h,w=4}).\end{aligned}\]

From these parameters, the relative risk of mortality is defined \[\check{H}_{h,i,s} = 1 + \tilde{H}_{a=a(i),h,w=1}\times \left\{1 - \exp\left(-\tilde{H}_{a=a(i),h,w=2} (\check{W}_{i,s} - \tilde{H}_{a=a(i),h,w=4}) ^ {\tilde{H}_{a=a(i),h,w=3} }\right)\right\}.\] For \(h\not\in\{\text{IHD, lung cancer, COPD, stroke, LRI}\}\), \(\check{H}_{h,i,s} \equiv 1\).

PA module

Individual-level MMETs

Using trip sets and the synthetic population, we calculate total MMETs per person (\(M_{i,s}\)) as the sum of walking MMETs per day and cycling MMETs per day, which are scaled up to a week via the scalar \(\chi\), and work/leisure MMETs per week as follows: \[M_{i,s}=\smashoperator{\sum_{\substack{j:i(j)=i,\\s(j)=s}}} \chi\lambda_{m(j)}\frac{T_j}{Z_{m(j)}}+\theta Y_i.\]

Here, \(\lambda_m\) is the MMETs for mode \(m\), \(T_j\) the distance of trip \(j\), \(Z_m\) the speed of mode \(m\), \(\theta\) the scalar for background PA, and \(\check{Y}_i\) the amount of work and leisure (background) PA of person \(i\).

We calculate \(\check{Y}_i\) according to the confidence in the PA survey, \(\tilde{Y}\). First, we calculate the raw probability that a person in demographic group completes no non-travel PA: \[\dot{Y}_{a,g}=\text{Prob}(Y_i=0)|_{a(i)=a,g(i)=g}.\] Then, we set the probability to use, \(\bar{Y}_{a,g}\), as \(\dot{Y}_{a,g}\) if our confidence \(\tilde{Y}\) is 1. If \(\tilde{Y}<1\), we map the confidence to parametrise a Beta distribution somehow, e.g. (Figure 4) \[\begin{aligned} \hat{Y} &= 500^{\tilde{Y}+0.2},\\ \beta & = \hat{Y}\dot{Y}_{a,g}\left(\frac{1}{\dot{Y}_{a,g}} -1\right),\\ \alpha & = \hat{Y} - \beta,\\ \bar{Y}_{a,g} & = \text{CDF}_{\text{Beta}(\alpha,\beta)}^{-1}(\tilde{\theta}).\end{aligned}\]

Finally, for each person in the population, we sample non-travel MMETs as zero with probability \(\bar{Y}_{a,g}\) and from the raw non-zero density of their demographic group with probability \(1-\bar{Y}_{a,g}\).

PA–disease dose–response relative risk

We use each person’s MMETs per week (\({M}_{i,s}\)) to calculate their relative risk of six diseases, using curves found from meta analysis. Each disease (except type 2 diabetes) has a threshold beyond which there is no further change in relative risk. (Total cancer, breast cancer, colon & rectum cancer, endometrial cancer, & coronary heart disease: 35; lung cancer: 10; stroke: 32; all cause: 16.08.) For all individual cancers, we estimate the burden in terms of . For all other diseases, we use mortality.

The interpolation of the dose–response curve can be described as follows: \[\tilde{G}_{h,x}(m) = {G}_{h,x,z=\lfloor{m}\rfloor}+({G}_{h,x,z=\lceil{m}\rceil}-{G}_{h,x,z=\lfloor{m}\rfloor})\frac{m-\lfloor{m}\rfloor}{\lceil{m}\rceil-\lfloor{m}\rfloor}\] where we interpolate the mean, the lower bound, and the upper bound. These define the normal from which we sample, truncated at 0: \[F_S(v_x) = \mathcal{N}\left.\left(v_{x=2},\frac{v_{x=3}-v_{x=1}}{1.96}\right)\right|_{F_S(v_x)>0}.\]

Then the relative risk for each person for each disease is calculated for the quantile \(\phi_h\) as \[W_{h,i,s}=\text{CDF}_{F_S\left(\tilde{G}_{h,x}(M_{i,s})\right)}^{-1}\left(\phi_{h}\right).\] See Figure 2 for an example. For \(h\not\in\) {diabetes, total cancer, breast cancer, colon & rectum cancer, endometrial cancer, IHD, lung cancer, stroke, all cause}, \(W_{h,i,s} \equiv 1\).

\[Example of the dose--response workflow for PA and total cancer. Top left: results of the meta-analysis: mean, lower bound and upper bound of the relationship between PA dose and relative risk of disease. Top right: examples of 500 samples from the normal distribution defined by the mean of the top-left panel and the standard deviation defined by (upper bound minus lower bound) divided by 1.96 and truncated at 0. Bottom left: relative risks of individuals with PA doses 5, 25 and 45 are found by mapping their doses onto the median curve. This is used in the "constant" ITHIM use case. Bottom right: relative risks of individuals with PA doses 5, 25 and 45 for three different samples from the distribution shown in the top-right panel. For each sample, each individual is mapped onto the response curve it defines.\]

Injury module

Processing

For the injury module, we require distance per travel mode per demographic group per scenario. It requires matching of mode names in the travel survey to mode names in the injury records, for example “taxi,” “shared auto,” “shared taxi” and “car” combine to form “car.” Some work is required to separate drivers from passengers; we currently assume all travellers are drivers, with the exception of “bus.”4

Modelling

We model injuries via regression by predicting the number of fatalities of each demographic group on each mode, which we calculate as a sum over all the ways in which they might have been injured (i.e., all the modes with which they might have collided).

The predictive covariates include the distances travelled by the parties and their demographic details. These requirements lead to a natural separation of the (training) dataset into two groups: a set for which we have distance data for the other party, and a set for which we do not have distance data for the other party. The former equates to a “who hit whom” (“whw”) matrix (albeit in a higher dimension), and will account for changes in injuries resulting of a change in strike-mode travel. The latter corresponds to causes of injury that will not change across scenarios, including “no other vehicle” and modes of transport that we do not consider to change, which might include trucks and buses if they are not somehow explicitly included in the trip set. We label this group “noov”: no or other vehicle.

Note on recategorization: for some of the modes (like for car, bus, motorcycle and cycle), if both strike and victim modes are same, we explicitly make the strike mode as ‘noov’

We model the number of injuries as a Poisson-distributed variable with an offset depending on the distance(s) and the reporting rate. We write the model as follows: \[\begin{aligned} \tilde{I}_{a,g,m_{\text{cas}},m_{\text{str}},s,y} \sim & \text{Poisson}(\bar{I}_{a,g,m_{\text{cas}},m_{\text{str}},s}); \\ % 'count~ \log\left(\bar{I}_{a,g,m_{\text{cas}},m_{\text{str}},s}|m_{\text{str}}\in\{m_{\text{whw}}\}\right) = & % cas_mode*strike_mode-log(injury_reporting_rate)+log(weight) {\kappa_0} +\sum_{i=1}^n\kappa_iX_i-\log(\sigma)+\nonumber\\ % +log(cas_distance) & \log(\bar{T}_{a,g,m_{\text{cas}},s}) % +(CAS_EXPONENT-1)*log(cas_distance_sum) +(E_{m_{\text{cas}}}-1)\log(\hat{T}_{m_{\text{cas}},s}) %+log(strike_distance)+(STR_EXPONENT-1)*log(strike_distance_sum) +E_{m_{\text{str}}}\log(\hat{T}_{m_{\text{str}},s});\\ \log\left(\bar{I}_{a,g,m_{\text{cas}},m_{\text{str}},s}|m_{\text{str}}\in\{m_{\text{noov}}\}\right) =& {\kappa'_0} +\sum_{i=1}^n\kappa'_iX'_i -\log(\sigma) % +log(cas_distance) + \log(\bar{T}_{a,g,m_{\text{cas}},s}) % +(CAS_EXPONENT-1)*log(cas_distance_sum) +(E_{m_{\text{cas}}}-1)\log(\hat{T}_{m_{\text{cas}},s}).\end{aligned}\] We choose this form of equation to enable (a) linearity in injuries with respect to total travel combined across modes and (b) linearity in injuries as subdivided by travellers of each mode.5 Note that this methodology is under development. See https://github.com/robj411/safety_in_numbers_power_law.

In general, one makes the best model that one can with the data available. We use, for Accra, the covariates casualty mode*strike mode, casualty age, and casualty gender, to form the model matrix \(X\). We also have another covariate, year, but our travel data are for one year only, so we model instead many observations of the same quantities.

We fit the coefficients \(\kappa\) of the model using data for ten years, assuming the same travel each year, which corresponds to the scenario \(s=\text{baseline}\). To incorporate the injury reporting rate (\(\sigma\)), we set it as an offset, whose value is 1 in fitting the model, and as specified in making predictions.

Prediction

We predict the number of injuries in each scenario based on the training model built from the baseline scenario. For Accra, we predict for the year 2016, using scenario-specific travel data. The number of fatalities is taken as the sum of fatalities over the strike modes: \[\check{I}_{a,g,m,o=\text{death},s}=\sum_{m_{\text{str}}}\tilde{I}_{a,g,m_{\text{cas}},m_{\text{str}},s}.\] We extrapolate injury deaths to injury YLLs via the ratio in the GBD: \[R_{a,g}=U_{a,g,h=\text{road injuries},o=\text{YLL}}/U_{a,g,h=\text{road injuries},o=\text{death}},\] \[\check{I}_{a,g,m,o=\text{YLL},s}=\check{I}_{a,g,m,o=\text{death},s}\cdot R_{a,g}.\]

Health module

PA and AP relative risks combined

We combine the relative risks of disease as a function of AP and disease as a function of PA through multiplication: \[\tilde{W}_{h,i,s}=W_{h,i,s}\cdot\check{H}_{h,i,s}.\] Not all diseases have a dose–response relationship for both AP and PA. Just stroke, lung cancer, and IHD have both. For the other diseases, only one of \(W_{h,i,s}\) and \(\check{H}_{h,i,s}\) is different from one.

Total disease burden

We calculate the total health burden relative to the reference scenario (which, for our Accra example, is the baseline) using the injury and health outputs combined with GBD data, via population attributable fractions (PAF). The PAF is defined: \[\hat{W}_{a,g,h,s}=\smashoperator{\sum_{\substack{i:a(i)=a,\\g(i)=g}}}\tilde{W}_{h,i,s}.\]

We then calculate the PAF relative to the reference scenario (“ref”) as: \[\bar{W}_{a,g,h,s}= \frac{\hat{W}_{a,g,h,s=\text{ref}}-\hat{W}_{a,g,h,s}}{\hat{W}_{a,g,h,s=\text{ref}}}.\]

We estimate the background burden of disease using GBD data and scaling based on the ratio of populations between country and city. For country populations \(N_{a,g}\), city populations \(\bar{N}_{a,g}\) and country burden \(U_{a,g,h,o}\), we estimate city burden as \[\bar{U}_{a,g,h,o}=U_{a,g,h,o}\frac{\bar{N}_{a,g}}{N_{a,g}}.\]

If we are scaling the background burden of non-communicable diseases (“Neoplasms,” “Ischemic heart disease,” “Tracheal, bronchus, and lung cancer,” “Breast cancer,” “Colon and rectum cancer,” “Uterine cancer”), we do so here. (Otherwise set \(\rho\equiv 1\).) \[\tilde{U}_{a,g,h,o}=\rho_h\cdot \bar{U}_{a,g,h,o}.\]

We combine the burden with the PAF: \[\hat{U}_{a,g,h,o,s}=\bar{W}_{a,g,h,s}\cdot\tilde{U}_{a,g,h,o}.\]

Injury burden

And, finally, for the injuries, we sum over modes to compute the burden, \[\bar{U}_{a,g,o,s}=\sum_m\check{I}_{a,g,m,o,s},\] and subtract from the values for the reference scenario: \[\hat{U}_{a,g,h=\text{road injury},o,s}=\bar{U}_{a,g,o,s=\text{ref}}-\bar{U}_{a,g,o,s}.\]

Example: Accra

Here we show the ITHIM as evaluated for the city of Accra, with a simple scenario of every person in the synthetic population gaining a single, 1 km walk. We define XX uncertain univariate parameters (Section 3.1), two confidences, and two uncertain parameter sets (Section 3.3).

Parametric distributions for uncertain variables

The XX uncertain univariate parameters are shown in Figure 3. Most of the parameters are self explanatory. Cycling and walking MMETs are the number of MMETs per hour when undertaking cycling and walking, and determine also the ventilation rates. Motorcycle distance is the total distance travelled by motorcycles relative to the total distance travelled by cars in the baseline scenario. Non-travel PA, injury reporting rate and NCD burden all act as scalars for the corresponding datasets. Note that the non-travel PA scalar does not affect the \(\approx\)40% of the population whose non-travel PA is 0.

(){#distributions width=“95%”}

Confidences

For physical activity propensity and emission inventories, we use “confidences,” values between 0 and 1 that represent how confident we are about the data source. We map these values to Beta (Figure 4) and Dirichlet (Figure 5) distributions, respectively. The Beta-distributed propensity towards non-travel physical activity is described in Section 2.3.1. The Dirichlet-distributed emissions inventory is described in Section 2.2.1.

\[Distributions of a Beta-distributed propensity to non-travel PA with varying confidence values in the raw input data. The sampled value gives the fraction of the population that engages in non-travel PA.\]

(){#prob_zero_PA width=“50%”}

## [1] "Distributions of Dirichlet-distributed emission inventories for four modes and varying confidence values. Raw input values were 4, 4, 20, and 60."

(){#emission_dist width=“65%”}

Dose–response relationships

For the dose–response relationships between physical activity (PA) and disease and air pollution (AP) and disease, we assume that there is uncertainty, but no variability, in the relationship. This means that we sample a relationship from the distribution of relationships, and apply that relationship to all individuals precisely. This means that, given fixed doses, responses between individuals will be perfectly correlated.

We achieve this by use of the probability integral transform: we sample a random variable uniformly distributed on the space (0,1) and map it, via a cumulative distribution function, to the distribution describing the dose–response relationship. See Figure 6 for an illustration.

We use the same uniform samples to generate random variables from different distributions. It means that the samples are perfectly correlated.

We use the same uniform samples to generate random variables from different distributions. It means that the samples are perfectly correlated.

(){#cdf width=“80%”}

Physical activity

Each disease’s PA dose–response relationship is defined by a truncated normal. For each dose, there is a mean value, an upper bound, and a lower bound. For each person’s dose, we get the response by mapping the uniform random variable onto the truncated normal defined by the mean and bounds for that dosage. We use an R package called {drpa}, whose description can be seen here: https://meta-analyses.github.io/drpa/drpa.html

The relationships comes out of our work Garcia et al. (2018), where we look at the association of physical activity with various diseases.

Air pollution

For the AP relationship, there are four parameters per disease. We sample the first from an empirical distribution using the probability integral transform. We sample the second via the same method, conditioned on the value of the first, constructing their joint density with e.g. kde2d. The third parameter is sampled conditioned on the first and second, constructing their joint density using a GAM. The final parameter is sampled conditioned on the first, second, and third, constructing their joint density using a GAM.

As before, there is perfect correlation between individuals, i.e. if person A’s dose is greater than person B’s, then person A’s response is strictly greater than person B’s response.

The empirical distributions come from Burnett et al. (2014). There are four parameters per disease: IHD, lung cancer, COPD, stroke, and LRI. In addition, for stroke and IHD, there is a set of four parameters for each age group from 25 to 95 in five-year increments. In addition to our assumption that there is perfect correlation between individuals for diseases, we assume perfect correlation between ages for diseases. I.e., our four quantiles per disease will be applied to all age groups.

Results

Health burdens in scenarios

YLLs in the scenario minus YLLs in the baseline for Accra

Value of information

We calculate value of information following Jackson et al. (2017). The “value of information” we implement is the expected value of perfect partial information (EVPPI). It is the expected amount by which variation in the outcome diminishes should a parameter, or set of parameters, be known perfectly. With reference to information theory, it is the mutual information between the parameter, or parameter set, and the outcome, where the outcome is described only by its second moment (its variance).

The objective of learning the value of information formulated in this way is to identify the parameter or parameters that most drive variability (variance) in the outcome. The results guide future research, identifying which area or areas offer the greatest benefit in terms of prediction accuracy given further research. For example, for our model of Accra, learning better the dose–response relationship between physical activity and stroke (incidence) would likely increase precision of our estimate of total years of life lost (YLL) more so than learning any other parameter (Figure 8).

\[EVPPI for Accra's "walking" scenario for all causes of YLLs excluding "all cause" and "neoplasm".\]

ITHIM with parameters

::: appendix \[ITHIM-R workflow, with variable parameters in purple. Fixed inputs to the model have a bold, black outline, which is solid for inputs we consider "global" and dashed for inputs we consider "local". There are four global inputs, which will be embedded in ITHIM-R. There are seven local inputs, which users should provide. The graph depicts parent--child relationships, where a child (at the head of an arrow) depends upon all its parents (at the source(s) of the arrow(s)). To aid visualisation, an arrow connected to the outside of a grey box indicates that the arrow connects to each item within the box. *AT: active travel. MET: metabolic equivalent task. pp: per person. AP: air pollution. PA: physical activity. DR: dose--response relationship. GBD: global burden of disease.* NB: $\rho$ only impacts the NCD burden of GBD, not all of it. Missing parameter: motorcycle distance. $\lambda$ is a 2D parameter. $\xi$ is 4D.\]

(){#ithim_with_parameters width=“\textwidth”}

Dose–response relationships

\[1000 samples from PA dose--response curves\]

(){#PADR width=“90%”}

\[100 samples from AP dose--response curves\] (cvd_strokeDR-sim.pdf){width=“40%”}

Tabulated ITHIM equations

CDF: cumulative distribution function. We use generative distributions to generate random numbers so that samples are correlated. For \(t=\text{CDF}_{F}^{-1}(q)\), \(t\) is the \(q\)-th quantile of function \(F\) with CDF CDF\(_F\). (I’d like to find an improved notation for this.)

All calculations in the ITHIM.
Name Description Label
Scenario generation
Travel calculations
\(\{T_{j^+}\leftarrow \epsilon; m(j^+)\leftarrow\text{walk} \}\, \forall\, T_j:m(j)=\text{bus}\) New walk trips to bus walk to_bus_and combine_scen
\(\bar{T}_{a,g,m,s}=\sum_{j:a(j )= a,g(j)=g,m(j)=m,s(j)=s}Q_{i(j)}T_j\) (Weighted) total travel per age, gender, mode and scenario
$ ck{T}_{a,g,m,s}=$ Relative travel per mode and scenario by age and gender inj_distances\[\[1\]]
\(\hat{ T} _{m,s}=\sum_{a,g}\bar{T}_{a,g,m,s}\) Total travel per mode and scenario dist
\(\tilde{T}_{m,s}=\frac{\hat{T}_{ m, s}}{\hat{T}_{m,s=\text{baseline}}}\) Total travel per mode and scenario relative to baseline, with car modes summed, and Tuktuk = 1 inj_distances\[\[2\]]
Injury calculations
\(E_{m_{\text{cas}}}=\psi\omega\) Casualty exponent CAS_EXPONENT
\(E_{m_{\t ex t{str}}}=\omega-E_{m_{\text{cas}}}\) Striker exponent STR_EXPONENT
\(\tilde{I}_{a,g,m_{\text{vic}},m_{\text{str}},s}=\mathcal{S}\left(\sigma\c dot {I}_{a,g,m_{\text{vic}},m_{\text{| Fataliti str}}},\bar{T}_{a,g,m,s},E_{m_{\t ex t{cas}}},E_{m_{\text{str}}}\right)\) es in scenarios by strike mode | injuries
\(\check{I}_{a,g,m,o=\text{death },s} =\sum_{m_{\text{str}}}\tilde{I}_{ a, g,m_{\text{vic}},m_{\text{str}},s}\) Fatalities in scenarios v ictim_deaths
\(R_{a,g}=U_{a,g,h=\text{road injuries},o=\text{YLL}}/U_{a,g,h= \t ext{road injuries},o=\text{death}}\) Road YLL-to-death ratio GBD_INJ_YLL
\(\check{ I}_{ a,g,m,o=\text{YLL},s}=\check{I}_{ a, g,m,o=\text{death},s}\cdot R_{a,g}\) YLLs scaled from fatalities deaths_ yll_injuries
Pollution calculations
\(\dot{P}_{m}=\left\{\begin{array}{lr} \gam ma_m & P=1 \\ \tilde{\gamma}_m, \ | tilde{\gamma}_m \sim \text{Dir}(f(P,\ | gamma_m)) & P<1 \end{array}\right.\) Emissions by mode | | trans_emissions
\(\tilde{P}_{m,s}=\dot{P}_m\hat{T} _{ m,s}/\hat{T}_{m,s=\text{baseline}}\) Emissions by mode in scenario relative to baseline trans_emissions
$ t{P}{s}= _{m,s}$ Emissions scalar for scenarios baseline_sum
\(\bar {P}_s=\eta(\zeta\hat{P}_s + 1-\zeta)\) | Background PM2.5 concentration in scenarios | scenario_pm
\(K_s=C_4-C_5\log(\bar{P_s})\) On-road PM2.5 exposure ratio on_road_off_road_ratio
\(\tilde{K}_{m\ in\{\text{walk,cyc.,mot.}\},s}=K_s\) On-road PM2.5 exposure ratio for open vehicle on_road_off_road_ratio
\(\ tilde{K}_{m\in\{\text{sub.}\},s}=C_6\) | Subway PM2.5 exposure ratio | subway_ratio
\(\ tilde{K}_{m\not\in\{\text{walk,cyc. ,mot.,sub.}\},s}=C_2C_3+K_s(1-C_3)\) In-vehicle PM2.5 exposure ratio in_vehicle_ratio
\(V_m=C_1+\frac{1}{2}\lambda_m\) Ventilation rates vent_rates
\(\tilde {V}_{i,m,s} = \sum_{j:i(j)=i,m(j) =m ,s(j)=s}T_j\cdot V_{m(j)}/Z_{m(j)}\) | In-travel air inhaled |
\(\bar{V}_{i,s} = \sum_{m}\t il de{V}_{i,m,s}\cdot \tilde{K}_{m,s}\) In-travel PM2.5 inhalation rate
$ {i,s} = C_1(24-T_j/Z_{m(j)})$ Non-travel air inhaled
${i,s} = { P}s({V}{i,s}+)/24 Total PM2.5 inhaled per hour
%/(*m
*+_{i,s})$
\(\tilde{H}_{a,h,w=3}=\text{CDF}_{H_{a ,h ,w=3}}^{-1}(\xi_{h,w=3})^{\dagger}\) | AP dose– response–curve parameter | DR_AP_LIST
\(\tilde{H}_{a,h,w=3} =\te xt{CDF}_{H_{a,h,w=3}|\tilde{H}_{a ,h ,w=3}}^{-1}(\xi_{h,w=3})^{\dagger}\) AP dose– response–curve parameter DR_AP_LIST
\(\tilde{H }_{a,h,w=3}=\text{CDF}_{H_{a,h, w=3} |\tilde{H}_{a,h,w=3},\tilde{H}_{a ,h ,w=3}}^{-1}(\xi_{h,w=3})^{\dagger}\) | AP dose– response–curve parameter | DR_AP_LIST
\(\tilde{H}_{a,h,w=3}=\text{CD F}_{H_{a,h,w=3}|\tilde{H}_{a,h, w=3} ,\tilde{H}_{a,h,w=3},\tilde{H}_{a ,h ,w=3}}^{-1}(\xi_{h,w=3})^{\dagger}\) AP dose– response–curve parameter DR_AP_LIST
\(\check{H}_{h,i,s} = 1 + \tilde{H}_{a=a(i),h,w=3}\times\)
\(\left\{1 - \exp\left(-\tilde{H }_ {a=a(i),h,w=3} (\check{W}_{i,s} - \ tilde{H}_{a=a(i),h,w=3}) ^ {\tild e{ H}_{a=a(i),h,w=3} }\right)\right\}\) RR of disease given pollution RR_AP_ calculations
Physical activity calculations
\(\dot{Y}_{a,g}=\ text{Prob}(Y_i=0)|_{a(i)=a,g(i)=g}\) Raw probability non-travel MMETs = 0 for demographic group \(a,g\) raw_zero
\(\b ar{Y}_{a,g}=\left\{\begin{array }{lr } \dot{Y}_{a,g} & \tilde{Y}=1 \\ \t ext{CDF}_{\text{Beta}(f(\tilde{Y},\ dot{Y}_{a,g}))}^{-1}(\tilde{\thet a} ) & \tilde{Y}<1 \end{array}\right.\) Probability non-travel MMETs = 0 for demographic group \(a,g\) zeros
\(\check{Y}_i = \left\{\begin {a rray}{lr} 0 & \bar{Y}_{a(i),g(i)} \ \ \text{Sample}(\{Y_{i'}\}|_{a( i')= a(i),g(i')=g(i),Y_{i'}>0}) & 1-\b ar {Y}_{a(i),g(i)} \end{array}\right.\) Non-travel MMETs SYNTHETIC_ POPULATION$ work_l tpa_marg_met
\(M_{i,s}=\sum_{j :i(j )=i,s(j)=s}(\chi T_j\cdot \lambda _{ m(j)}/Z_{m(j)})+\theta \check{Y}_i\) Total MMETs per person mmets
\(\tilde{G}_{h,x}(m) = {G}_{h, x,z= \lfloor{m}\rfloor}+({G}_{h,x,z=\l ce il{m}\rceil}-{G}_{h,x,z=\lfloor{m}\ rfloor})\frac{m-\lfloor{m}\rfloor }{ \lceil{m}\rceil-\lfloor{m}\rfloor}\) Interpolated dose–response parameters rr, lb, ub
\(F_S(s_x) = \mathcal{N}(s_{x=2},( s_ {x=3}-s_{x=1})/1.96)|_{0<F_S(s_x)}\) Truncated normal function
\(W_{h,i,s}=\text{CDF}_{F_S\ left (\tilde{G}_{h,x}(M_{i,s})\right)} ^{ -1}\left(\phi_{h}\right)^{\dagger}\) Relative risk of disease given PA RR_PA_ calculations
Burden-of-disease calculations
\(\tilde{W}_{h, i, s}=W_{h,i,s}\cdot\check{H}_{h,i,s}\) Combined PA and AP relative risks RR_PA_AP_ calculations
\(\hat{W}_{a,g,h,s}= \su m_ {i:a(i)=a,g(i)=g}\tilde{W}_{h,i,s}\) Populati on-attributable fractions pif_temp
\(\bar{W}_{a,g,h,s}= (\hat{W} _{a, g,h,s=\text{scen1}}-\hat{W}_{a,g, h, s})/\hat{W}_{a,g,h,s=\text{scen1}}\) Populati on-attributable fractions relative to scenario 1 pif_scen
\(\bar{U}_{a,g,h,o}=U_{a ,g ,h,o}\frac{\bar{N}_{a,g}}{N_{a,g}}\) GBD scaled to local population DISEASE_BURDEN
\(\tilde{U}_{a, g, h,o}=\rho_h\cdot \bar{U}_{a,g,h,o}\) Scaled background disease for Neoplasms, IHD, Lung cancer gbd_data_scaled
\(\hat{U}_{a,g,h,o,s}=\bar{ W} _{a,g,h,s}\cdot\tilde{U}_{a,g,h,o}\) Combined health and PIF yll_dfs, death_dfs
\(\hat{U}_{a,g,h=\text{road injury },o,s}=\sum_m\check{I}_{a,g,m,o,s}\) Injury health burden inj

Uncertainty in travel data

We model uncertainty in travel via the definition of “travel attributes” (in the terminology of Mohammadian, Javanmardi, and Zhang (2010)). Specifically, we define the propensity to travel in a day, and the distance travelled in a day, given that travelled occurred. Each travel attribute is defined per demographic group, per mode.

In contrast to other travel-related synthetic populations, we have four demographic groups (rather than clusters) which consist of individuals (rather than households). A crucial difference between our objectives and the existing work on simulating populations in travel-demand models (that I’ve found so far) is that we don’t need trip-level data for our calculations. Simulating trip-level data is computationally expensive and high-dimensional in terms of uncertain parameters. For our downstream computations, we need only summary data: travel per person per mode. Beyond scenario generation, there is no need for us to consider trips.

E.g. Saadi et al. (2016) focuses on (a) synthetic population via matching covariates and (b) the sequence of trips taken by each person. (a) isn’t a big thing for Accra; we’re only using two covariates. And for (b), I think we can focus on people as it’s a lot simpler. For our purposes, the things in this work that might be useful to us include matching principles for (a), when we revisit it, and distributions (in particular joint distributions) for travel. More useful to us would be distributions for total travel, but I haven’t found any in this literature (e.g. zero-inflated compound Poisson distribution).

In contrast, my main questions are:

  • How to account for multi-mode trips.

  • What to do about zero travel for e.g. older cyclists. One possibility is to smooth over all modes and/or demographic groups.

  • In the ITHIM-R programme, are bus and truck drivers participants?

  • How should we include correlations between modes? This could also be aided by a smooth model across modes.

  • Whether there is a better method to get distance travelled per person than by resampling raw data.

Method

Summarise the travel survey

Our first step is to summarise the demographic information and the travel survey. In Table 5 are the demographic data: the populations of the city, and the number of people in the survey. In Table 6 are the summary statistics from the travel survey which we will use to generate synthetic travel. From the raw travel data (actually the adjusted trip set, which is the raw travel plus motorcycle, bus and truck trips), we calculate the probability to travel, \(B_{a,g,m}\), as the number of people who travelled by that mode divided by the number of people in the demographic group. It is to these values that we assign uncertainty through the variable PROPENSITY_TO_TRAVEL.

Population by demographic group.
Age Gender Surveyed Accra population
15–49 Male 279 5,847,716
50–69 Male 56 1,016,476
15–49 Female 328 6,360,535
50–69 Female 69 1,110,037
Travel summary table.
Age Gender Mode Probability in raw trip set Probability in baseline trip set
15-49 Male Bus 0.4014 0.3862
50-69 Male Bus 0.4286 0.3636
15-49 Female Bus 0.3506 0.3495
50-69 Female Bus 0.4203 0.4203
15-49 Male Taxi 0.0681 0.0655
50-69 Male Taxi 0.0357 0.0303
15-49 Female Taxi 0.0671 0.0669
50-69 Female Taxi 0.1159 0.1159
15-49 Male Walking 0.6559 0.631
50-69 Male Walking 0.4643 0.3939
15-49 Female Walking 0.5427 0.541
50-69 Female Walking 0.4203 0.4203
15-49 Male Private Car 0.1254 0.1207
50-69 Male Private Car 0.1964 0.1667
15-49 Female Private Car 0.0549 0.0547
50-69 Female Private Car 0.058 0.058
15-49 Male Bicycle 0.0179 0.0172
50-69 Male Bicycle 0 0
15-49 Female Bicycle 0.003 0.003
50-69 Female Bicycle 0 0
15-49 Male Short Walking 0 0.3862
50-69 Male Short Walking 0 0.3636
15-49 Female Short Walking 0 0.3495
50-69 Female Short Walking 0 0.4203
15-49 Male Motorcycle 0 0.0621
50-69 Male Motorcycle 0 0.1515
15-49 Female Motorcycle 0 0.0061
50-69 Female Motorcycle 0 0.0145

Adjust for trip weight

We define a trip-mode weight per mode, \(L_m\), so that a single trip by mode \(m\) in the baseline scenario represents \(L_m\) trips in reality. If in a scenario the mode of the trip is changed (\(m'\rightarrow m\)), it retains the weight of its original mode \(m'\). We use the trip-mode weight to (a) sample trips for scenario generation and (b) scale the probabilities to travel. Note that we don’t use it as trip weights are ordinarily used: to scale trip distance/duration. The result is a scaled set of probability parameters: \[\hat{B}_{a,g,m} = L_m{B}_{a,g,m}.\] (\(L_m\) should actually be the average trip-mode weight for all trips in the scenario. For baseline, it will be the same. It can/should differ for scenarios. Need to work out notation.)

Smooth probabilities

We estimate smoothed probabilities for any mode that has zero or one probability, using the rest of the probabilities for reference, replacing \(\hat{B}_{a,g,m}\). The reference to other modes is slightly problematic if e.g. in the scenario all travel increases. It means a 0 in the baseline is not the same as a 0 in the scenario. This could be fixed with more rigid coding.

Resample probabilities

We resample the adjusted probability parameters of Table 6. We describe each parameter with a Beta distribution as follows: \[\begin{aligned} \label{pitilde} \tilde{B}_{a,g,m} & \sim \text{Beta}(\alpha_{a,g,m},\beta_{a,g,m}); \\ \alpha_{a,g,m}+\beta_{a,g,m} & = 200; \\ \hat{B}_{a,g,m} & = \frac{1}{1+\frac{\beta_{a,g,m}}{\alpha_{a,g,m}}}.\end{aligned}\] Here, \(\tilde{B}_{a,g,m}\) is the new, variable parameter that we resample for each simulation. We sample via a quantile, \(\nu_m\sim\mathcal{U}(0,1)\), which we use to calculate \(\tilde{B}_{a,g,m}\). We set \(\alpha_{a,g,m}\) and \(\beta_{a,g,m}\) to sum to 200, making the distributions concentrated, and we constrain their relative values to enforce the distribution’s mean to equal \(\hat{B}_{a,g,m}\) (the right-most column in Table 6).

Inform individual travel

Each individual is assigned their own propensity to travel per mode, \(\delta_{i,m}\), which is a uniformly distributed random variable that is fixed and not resampled. Whether or not individual \(i\) travels by mode \(m\) depends on \(\tilde{B}_{a(i),g(i),m}\), which does vary: specifically, individual \(i\) travels by mode \(m\) if \(\delta_{i,m}<\tilde{B}_{a(i),g(i),m}\). (To mitigate the effects of these fixed random variables, a large population is required, e.g. more than 5000 individuals.)

\[Distributions for propensities to travel. The propensity is the probability that a member of a demographic group will travel by a particular mode on a particular day. In orange are the raw probabilities based on the synthetic trip set, here shown for the baseline scenario. Beta distributions are defined via a pointiness parameter, set to 200, such that $\alpha+\beta=200$.\]

(){#propensity_to_travel width=“70%”}

Extrapolating to one week of travel

In order to extrapolate to one week of travel, we consider “trips” only at the stage of scenario generation, and proceed with individual-level statistics. Then, in our synthetic population, we have “duration travelled per mode” columns. None of the core ITHIM-R functions will use the “synthetic trip set.”

To get duration per mode per person, we use, for each demographic-group–mode combination, the probability to travel, and the density of the total duration travelled given that travel occurred. To get from one day to one week, consider two cases: one, each individual follows the same pattern of travel every day. That is, we sample one day’s travel and apply it to seven days. Two, each individual follows the travel pattern of their demographic group on each day. That is, we sample seven times from the group’s density to make up seven days of travel.

In implementation, we blend individual and group-level travel through a “repetitiveness” parameter, \(\pi\). Each individual has two travel sets: one from their simulated daily travel pattern, and one from the shared, convolved distribution. These travel sets are weighted by \(\pi\) and \(1-\pi\), respectively. \(\delta_{i,m}\) defines the quantile taken for both the individual density, which is scaled up to one week, and the shared distribution, which is the convolution of the raw density seven times.6 The densities are shown in Figure 17.

\[Raw distributions for duration travelled per day (left) from the travel survey. Convolution to one week is shown on the right. We use ""walking"" mode for demographic group "Male", "15--49".\]

(){#raw_densities width=“80%”}

Simulating a synthetic population

  1. Choose a population size of individuals \({N}\). Label each individual \(i=1,...,N\).

  2. Allocate each individual to a demographic group according to the proportions in the Accra population (Table 5). Then each demographic group has number \(\tilde{N}_{a,g}\) and \(\sum_{a,g}\tilde{N}_{a,g}=N\).

  3. Generate one standard uniform variable per person per mode, \(\delta_{i,m}\sim\mathcal{U}(0,1)\). \(\delta_{i,m}\) is individual \(i\)’s propensity to travel by mode \(m\).

  4. \(\tilde{B}_{a,g,m,s}\) is the propensity of group {\(a,g\)} to travel by mode \(m\) in scenario \(s\). It is the proportion of raw person–duration values that are zero. It is parametrised by \(\alpha_{a,g,m,s}\) and \(\beta_{a,g,m,s}\) (Equation \[pitilde\]).

  5. Let \(\mathcal{F}_{a,g,m,s}\) be the raw density for raw person–duration values for demographic group {\(a,g\)} and mode \(m\) in scenario \(s\): \(\mathcal{F}_{a,g,m,s}=\left\{\sum_{j:i(j)=i,m(j)=m,s(j)=s} T_j\right\}_{i:a(i)=a,g(i)=g}\) i.e. we isolate all individuals \(i\) in group \(\{a,g\}\) and sum their trip durations by mode \(m\) in scenario \(s\).

  6. Let \(\tilde{\mathcal{F}}_{a,g,m,s}\) be that density corrected to have proportion \(\tilde{B}_{a,g,m,s}\) zeros.

  7. Define each individual’s travel duration as \(D_{i,m,s}=7\cdot\text{CDF}_{\tilde{\mathcal{F}}_{a(i),g(i),m,s}}^{-1}(\delta_{i,m})\) (see Figure 18 for resulting distributions).

For sampling, we loop over steps 6–7, where each sample has unique \(\tilde{B}_{a,g,m,s}\) values. In total, the number of uncertain variables is one per mode.

Simulating a synthetic population with repetitiveness parameter & uncertain extrapolation

  1. Choose a population size of individuals \({N}\). Label each individual \(i=1,...,N\).

  2. Allocate each individual to a demographic group according to the proportions in the Accra population (Table 5). Then each demographic group has number \(\tilde{N}_{a,g}\) and \(\sum_{a,g}\tilde{N}_{a,g}=N\).

  3. Generate one standard uniform variable per person per mode, \(\delta_{i,m}\sim\mathcal{U}(0,1)\). \(\delta_{i,m}\) is individual \(i\)’s propensity to travel by mode \(m\).

  4. \(\tilde{B}_{a,g,m,s}\) is the propensity of group {\(a,g\)} to travel by mode \(m\) in scenario \(s\). It is the proportion of raw person–duration values that are zero. It is parametrised by \(\alpha_{a,g,m,s}\) and \(\beta_{a,g,m,s}\) (Equation \[pitilde\]).

  5. Let \(\mathcal{F}_{a,g,m,s}\) be the raw density for raw person–duration values for demographic group {\(a,g\)} and mode \(m\) in scenario \(s\): \(\mathcal{F}_{a,g,m,s}=\left\{\sum_{j:i(j)=i,m(j)=m,s(j)=s} T_j\right\}_{i:a(i)=a,g(i)=g}\) i.e. we isolate all individuals \(i\) in group \(\{a,g\}\) and sum their trip durations by mode \(m\) in scenario \(s\).

  6. Let \(\tilde{\mathcal{F}}_{a,g,m,s}^7\) be the week convolution of \(\tilde{\mathcal{F}}_{a,g,m,s}\) (see Figure 17, right).

  7. Define each individual’s travel duration as \(D_{i,m,s}=7\pi\cdot\text{CDF}_{\tilde{\mathcal{F}}_{a(i),g(i),m,s}}^{-1}(\delta_{i,m})+(1-\pi)\cdot\text{CDF}_{\tilde{\mathcal{F}}_{a(i),g(i),m,s}^7}^{-1}(\delta_{i,m})\) (see Figure 18 for resulting distributions).

For sampling, we loop over steps 6–8, where each sample has unique \(\tilde{B}_{a,g,m,s}\) and \(\pi\) values. In total, the number of uncertain variables is one per mode plus one.

\[Samples of total distances travelled by mode and scenario.\]

(){#distances_beta width=“80%”}

Results

\[YLLs in the scenario minus YLLs in the baseline for Accra, with motorcycle probability scalar 2 in Accra.\]

(){width=“80%”}

\[EVPPI for Accra's "walking" scenario for all causes of YLLs excluding "all cause" and "neoplasm", with propensity to motorcycle = 2.\]

(){#evppi_mode_propensities_200 width=“70%”}

Glossary by letter

\(\text{Letter}\) Meaning \(\text{Letter}\) Meaning \(\text{Letter}\) Meaning
\(\alpha\) \(a\) age \(A\)
\(\beta\) \(b\) \(B\) raw travel probabilities
\(\gamma\) emission uncertainty \(c\) \(C\) ventilation constants
\(\delta\) propensity to travel \(d\) demographic group \(D\)
\(\epsilon\) walk-to-bus time \(e\) \(E\) injury distance exponent
\(\zeta\) traffic PM2.5 fraction \(f\) \(F\) PA DR function
\(\eta\) PM2.5 concentration \(g\) gender \(G\) PA DR curves
\(\theta\) PA uncertainty \(h\) disease \(H\) AP DR curves
\(\iota\) \(i\) individual \(I\) injuries
$$ \(j\) trip \(J\)
\(\kappa\) injury regression parameter \(k\) \(K\) AP exposure
\(\lambda\) travel MMETs \(l\) \(L\)
\(\mu\) mode distances \(m\) mode \(M\) MMETs per person
\(\nu\) \(n\) \(N\) population
\(\xi\) AP DR quantile \(o\) outcome \(O\)
\(\pi\) travel repetitiveness \(p\) \(P\) vehicle emissions
$$ \(q\) \(Q\) person weights
\(\rho\) chronic disease scalar \(r\) \(R\) death-to-injury ratio
\(\sigma\) injury reporting rate \(s\) scenario \(S\)
\(\tau\) \(t\) \(T\) trips
\(\upsilon\) \(u\) \(U\) burden of disease
\(\phi\) PA DR quantile \(v\) \(V\) AP per person
$$ \(w\) AP DR parameter \(W\) relative risks
\(\chi\) day-to-week travel scalar \(x\) PA DR parameter \(X\) injury model matrix
\(\psi\) casualty fraction of \(\omega\) \(y\) year \(Y\) individual MMETs
\(\omega\) sum of distance exponents \(z\) PA dose \(Z\) speeds

:::

Burnett, Richard T., C. Arden Pope, Majid Ezzati, Casey Olives, Stephen S. Lim, Sumi Mehta, Hwashin H. Shin, et al. 2014. An integrated risk function for estimating the global burden of disease attributable to ambient fine particulate matter exposure.” Environmental Health Perspectives 122 (4): 397–403. https://doi.org/10.1289/ehp.1307049.
Garcia, Leandro, Tessa Strain, Ali Abbas, Matthew Pearce, Alexander Mok, Alessio Crippa, Yvonne Laird, et al. 2018. “Physical Activity and Risk of Cardiovascular Disease, Cancer, and Mortality: A Dose-Response Meta-Analysis of Prospective Studies.” PROSPERO: International Prospective Register of Systematic Reviews. https://doi.org/https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=95481.
Goel, Rahul, Shahzad Gani, Sarath K Guttikunda, Daniel Wilson, and Geetam Tiwari. 2015. On-road PM2.5 pollution exposure in multiple transport microenvironments in Delhi.” Atmospheric Environment 123: 129–38. https://doi.org/10.1016/j.atmosenv.2015.10.037.
Jackson, Christopher, Anne Presanis, Stefano Conti, and Daniela De Angelis. 2017. Value of Information: Sensitivity Analysis and Research Design in Bayesian Evidence Synthesis.” arXiv. https://arxiv.org/abs/1703.08994.
Mohammadian, Abolfazl (Kouros), Mahmoud Javanmardi, and Yongping Zhang. 2010. Synthetic household travel survey data simulation.” Transportation Research Part C 18: 869–78. https://doi.org/10.1016/j.trc.2010.02.007.
Saadi, Ismaïl, Ahmed Mustafa, Jacques Teller, and Mario Cools. 2016. Forecasting travel behavior using Markov Chains-based approaches 69: 402–17. https://doi.org/10.1016/j.trc.2016.06.020.

  1. Ainsworth compendium 2011 sites.google.com/site/ compendiumofphysicalactivities for walking and cycling↩︎

  2. For any person of age lower than 25, we set the relative risk to 1.↩︎

  3. The four parameters refer, in numerical order, to \(\alpha, \beta, \gamma\) and tmrel in Burnett et al. (2014).↩︎

  4. This is a problem for the examples of Delhi and Bangalore, whose trip sets have car travellers under the age at which driving is permitted and who therefore should not contribute to striking distance.↩︎

  5. Note that the assymmetry in the offset term for the whw expression arises due to the absence of demographic information for striking vehicles. Were we to have that, the offset would read \(\log(\bar{T}_{a,g,m_{\text{cas}},s})+(E_{m_{\text{cas}}}-1)\log(\hat{T}_{m_{\text{cas}},s})+\log(\bar{T}_{a,g,m_{\text{str}},s})+(E_{m_{\text{str}}}-1)\log(\hat{T}_{m_{\text{str}},s})\). For our case, the latter two terms simplify to \(\log(\hat{T}_{m_{\text{str}},s})+(E_{m_{\text{str}}}-1)\log(\hat{T}_{m_{\text{str}},s})=E_{m_{\text{str}}}\log(\hat{T}_{m_{\text{str}},s})\).↩︎

  6. Using the same random variable for both travel sets increases the variability of the resulting dataset. Conversely, using different random variables would result in more homogenous durations, and fewer zeros.↩︎